In inverse reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. However, many existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs). We address two limitations of existing IRL techniques. First, they require an excessive amount of data due to the information asymmetry between the expert and the learner. Second, most of these IRL techniques require solving the computationally intractable forward problem -- computing an optimal policy given a reward function -- in POMDPs. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations. Further, the algorithm avoids a common source of algorithmic complexity by building on causal entropy as the measure of the likelihood of the demonstrations as opposed to entropy. Nevertheless, the resulting problem is nonconvex due to the so-called forward problem. We solve the intrinsic nonconvexity of the forward problem in a scalable manner through a sequential linear programming scheme that guarantees to converge to a locally optimal policy. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the task while inducing similar behavior to the expert by leveraging the provided side information.
translated by 谷歌翻译
Neural ordinary differential equations (NODEs) -- parametrizations of differential equations using neural networks -- have shown tremendous promise in learning models of unknown continuous-time dynamical systems from data. However, every forward evaluation of a NODE requires numerical integration of the neural network used to capture the system dynamics, making their training prohibitively expensive. Existing works rely on off-the-shelf adaptive step-size numerical integration schemes, which often require an excessive number of evaluations of the underlying dynamics network to obtain sufficient accuracy for training. By contrast, we accelerate the evaluation and the training of NODEs by proposing a data-driven approach to their numerical integration. The proposed Taylor-Lagrange NODEs (TL-NODEs) use a fixed-order Taylor expansion for numerical integration, while also learning to estimate the expansion's approximation error. As a result, the proposed approach achieves the same accuracy as adaptive step-size schemes while employing only low-order Taylor expansions, thus greatly reducing the computational cost necessary to integrate the NODE. A suite of numerical experiments, including modeling dynamical systems, image classification, and density estimation, demonstrate that TL-NODEs can be trained more than an order of magnitude faster than state-of-the-art approaches, without any loss in performance.
translated by 谷歌翻译
Effective inclusion of physics-based knowledge into deep neural network models of dynamical systems can greatly improve data efficiency and generalization. Such a-priori knowledge might arise from physical principles (e.g., conservation laws) or from the system's design (e.g., the Jacobian matrix of a robot), even if large portions of the system dynamics remain unknown. We develop a framework to learn dynamics models from trajectory data while incorporating a-priori system knowledge as inductive bias. More specifically, the proposed framework uses physics-based side information to inform the structure of the neural network itself, and to place constraints on the values of the outputs and the internal states of the model. It represents the system's vector field as a composition of known and unknown functions, the latter of which are parametrized by neural networks. The physics-informed constraints are enforced via the augmented Lagrangian method during the model's training. We experimentally demonstrate the benefits of the proposed approach on a variety of dynamical systems -- including a benchmark suite of robotics environments featuring large state spaces, non-linear dynamics, external forces, contact forces, and control inputs. By exploiting a-priori system knowledge during training, the proposed approach learns to predict the system dynamics two orders of magnitude more accurately than a baseline approach that does not include prior knowledge, given the same training dataset.
translated by 谷歌翻译
我们在非常严重的数据限制下开发一种基于学习的动态系统的控制算法。具体地,该算法只能从单个和正在进行的试验中访问流和嘈杂的数据。它通过有效地利用有关动力学的各种形式的侧面信息来实现这种性能,以降低样本复杂性。这些侧面信息通常来自系统的基本定律和系统的定性特性。更确切地说,该算法大致解决了编码系统所需行为的最佳控制问题。为此,它构建并迭代地改进数据驱动的差分包容,其包含动态的未知矢量字段。在间隔泰勒的方法中使用的差分包容使得能够过度近似于系统可能达到的状态。从理论上讲,我们在具有已知动态的最佳控制的最佳控制的近似解的次优化上建立了界限。我们展示了试验或更侧面信息的时间越长,界限更严格。凭经验,在高保真F-16飞机模拟器和Mujoco的环境中的实验说明,尽管数据稀缺,但算法可以提供与培训数百万环境相互作用的增强学习算法相当的性能。此外,我们表明该算法优于系统识别和模型预测控制的现有技术。
translated by 谷歌翻译
我们研究了逆钢筋学习的问题(IRL),学习代理使用专家演示恢复奖励功能。大多数现有的IRL技术使代理商可以访问有关环境的完整信息,这使得经常不切实际的假设。我们通过在部分可观察到的马尔可夫决策过程(POMDPS)中开发IRL算法来消除此假设。该算法解决了现有技术的若干限制,这些技术不会考虑专家和学习者之间的信息不对称。首先,它采用因果熵作为专家演示的可能性,而不是在大多数现有的IRL技术中熵,避免了算法复杂性的共同来源。其次,它包含以时间逻辑表示的任务规范。除了演示之外,这些规范可以被解释为对学习者可用的侧面信息,并且可以减少信息不对称。然而,由于所谓的前向问题的内在非凸起,即计算最佳政策,在POMDPS中计算最佳政策,所得到的制剂仍然是非凸的。通过顺序凸编程来解决这种非凸起,并介绍几个扩展以以可扩展的方式解决前向问题。这种可扩展性允许计算策略,以牺牲添加的计算成本为代价也越优于无记忆策略。我们证明,即使具有严重限制的数据,算法也会了解满足任务的奖励函数和策略,并通过利用侧面信息并将内存结合到策略中来对专家引起类似的行为。
translated by 谷歌翻译
Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text. Despite considerable research efforts in recent years for English text, the task of ED in other languages has been significantly less explored. Switching to non-English languages, important research questions for ED include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages. To answer those questions, it is crucial to obtain multilingual ED datasets that provide consistent event annotation for multiple languages. There exist some multilingual ED datasets; however, they tend to cover a handful of languages and mainly focus on popular ones. Many languages are not covered in existing multilingual ED datasets. In addition, the current datasets are often small and not accessible to the public. To overcome those shortcomings, we introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages; 5 of them have not been supported by existing multilingual datasets. We also perform extensive experiments and analysis to demonstrate the challenges and transferability of ED across languages in MINION that in all call for more research effort in this area.
translated by 谷歌翻译
Event Extraction (EE) is one of the fundamental tasks in Information Extraction (IE) that aims to recognize event mentions and their arguments (i.e., participants) from text. Due to its importance, extensive methods and resources have been developed for Event Extraction. However, one limitation of current research for EE involves the under-exploration for non-English languages in which the lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance. To address this limitation, we propose a novel Multilingual Event Extraction dataset (MEE) that provides annotation for more than 50K event mentions in 8 typologically different languages. MEE comprehensively annotates data for entity mentions, event triggers and event arguments. We conduct extensive experiments on the proposed dataset to reveal challenges and opportunities for multilingual EE.
translated by 谷歌翻译
In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs). To preserve UeDP, we developed a novel algorithm, called UeDP-Alg, optimizing the trade-off between privacy loss and model utility with a tight sensitivity bound derived from seamlessly combining user and sensitive entity sampling processes. An extensive theoretical analysis and evaluation show that our UeDP-Alg outperforms baseline approaches in model utility under the same privacy budget consumption on several NLM tasks, using benchmark datasets.
translated by 谷歌翻译
流视频是创作者与观众分享创意作品的方法之一。在这些视频中,流媒体分享了如何通过在一个或几个用于创意项目的程序中使用各种工具来实现最终目标。为此,可以讨论实现最终目标所需的步骤。因此,这些视频可以提供大量的教育内容,这些内容可用于学习如何使用流媒体使用的工具。但是,缺点之一是,流媒体可能无法为每个步骤提供足够的详细信息。因此,对于学习者来说,可能很难赶上所有步骤。为了减轻此问题,一种解决方案是将流视频与流视频中使用的工具可用的相关教程联系起来。更具体地说,系统可以分析实时流媒体视频的内容,并推荐最相关的教程。由于现有的文档推荐模型无法处理这种情况,因此在这项工作中,我们为实时流程视频的教程建议提供了一个新颖的数据集和模型。我们对拟议的数据集和模型进行了广泛的分析,揭示了该任务的挑战性质。
translated by 谷歌翻译
键形提取是NLP中文档理解的重要任务之一。虽然大多数先前的作品都致力于正式设置,例如书籍,新闻或网络博客,但探索视频成绩单等非正式文本的探索较少。为了解决这一局限性,在这项工作中,我们提出了一种新颖的语料库和方法,用于从Behance平台上流的视频的成绩单中提取钥匙短语。更具体地说,在这项工作中,提出了一种新型的数据增强,以通过从其他域中提取键形提取任务的背景知识来丰富模型。提出的数据集数据集上的广泛实验显示了引入方法的有效性。
translated by 谷歌翻译